Skip to content

fix: resolve GBK encoding errors on Windows for Chinese content#610

Open
Mars-ending wants to merge 1 commit intoOpenBMB:mainfrom
Mars-ending:fix-gbk-encoding
Open

fix: resolve GBK encoding errors on Windows for Chinese content#610
Mars-ending wants to merge 1 commit intoOpenBMB:mainfrom
Mars-ending:fix-gbk-encoding

Conversation

@Mars-ending
Copy link
Copy Markdown

Problem

On Windows systems with Chinese locale, Python's stdout uses GBK encoding by default. This causes UnicodeEncodeError when:

  1. Model responses contain CJK characters or emoji
  2. Logs are written to files via FileHandler without encoding specified

Error example:
'gbk' codec can't encode character '\U0001f4d6' in position 1189

Changes

  1. server_main.py:

    • Wrap sys.stdout/stderr with UTF-8 TextIOWrapper on Windows
    • Add encoding='utf-8' to FileHandler for server.log
  2. utils/structured_logger.py:

    • Add encoding='utf-8' to FileHandler for workflow logs
  3. utils/logger.py:

    • Wrap print() in try/except for UnicodeEncodeError fallback
    • On encoding error, strip problematic characters gracefully

Testing

Verified on Windows 11 with Chinese locale:

  • Workflow with Chinese task prompts now completes without encoding errors
  • Generated files correctly contain Unicode characters (CJK, emoji)

## Problem

On Windows systems with Chinese locale, Python's stdout uses GBK encoding
by default. This causes UnicodeEncodeError when:
1. Model responses contain CJK characters or emoji
2. Logs are written to files via FileHandler without encoding specified

Error example:
  'gbk' codec can't encode character '\U0001f4d6' in position 1189

## Changes

1. server_main.py:
   - Wrap sys.stdout/stderr with UTF-8 TextIOWrapper on Windows
   - Add encoding='utf-8' to FileHandler for server.log

2. utils/structured_logger.py:
   - Add encoding='utf-8' to FileHandler for workflow logs

3. utils/logger.py:
   - Wrap print() in try/except for UnicodeEncodeError fallback
   - On encoding error, strip problematic characters gracefully

## Testing

Verified on Windows 11 with Chinese locale:
- Workflow with Chinese task prompts now completes without encoding errors
- Generated files correctly contain Unicode characters (CJK, emoji)

---
Co-Authored-By: Claude <noreply@anthropic.com>
@huatl98
Copy link
Copy Markdown
Collaborator

huatl98 commented Apr 20, 2026

Thanks for the fix! One issue still seems not fully covered.

The standalone console fallback in WorkflowLogger still seems incomplete. utils/logger.py adds a fallback for UnicodeEncodeError, but I was still able to reproduce an uncovered case locally: on a Windows GBK/CP936 console, directly using WorkflowLogger to output text containing emoji or other non-GBK characters can still raise an encoding error. The example I used to reproduce this was: 中文🙂.

So the main path launched through server_main.py appears to be fixed, but standalone usage of WorkflowLogger may still have a gap.

Please consider improving the WorkflowLogger fallback so it can safely output Unicode content even when used independently from server_main.py.

It would also be helpful to add regression tests covering:

  • Direct WorkflowLogger output with emoji or other non-GBK characters
  • UTF-8 log file writing
  • Unicode output through the server_main.py startup path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants